Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 132 |
| Missing cells | 81 |
| Missing cells (%) | 3.4% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 18.7 KiB |
| Average record size in memory | 145.0 B |
Variable types
| NUM | 11 |
|---|---|
| CAT | 6 |
| BOOL | 1 |
Start_Lat has a high cardinality: 72 distinct values | High cardinality |
End_Lon is highly correlated with Start_Lon and 2 other fields | High correlation |
Start_Lon is highly correlated with End_Lon and 2 other fields | High correlation |
End_Lat is highly correlated with Start_Lon and 2 other fields | High correlation |
mta_tax is highly correlated with Start_Lon and 2 other fields | High correlation |
Total_Amt is highly correlated with Fare_Amt | High correlation |
Fare_Amt is highly correlated with Total_Amt | High correlation |
Rate_Code has 12 (9.1%) missing values | Missing |
store_and_forward has 33 (25.0%) missing values | Missing |
mta_tax has 10 (7.6%) missing values | Missing |
Total_Amt has 26 (19.7%) missing values | Missing |
Trip_Pickup_DateTime has unique values | Unique |
Trip_Dropoff_DateTime has unique values | Unique |
Trip_Distance has 22 (16.7%) zeros | Zeros |
Start_Lon has 2 (1.5%) zeros | Zeros |
End_Lon has 2 (1.5%) zeros | Zeros |
End_Lat has 2 (1.5%) zeros | Zeros |
surcharge has 70 (53.0%) zeros | Zeros |
Tip_Amt has 41 (31.1%) zeros | Zeros |
Tolls_Amt has 69 (52.3%) zeros | Zeros |
Total_Amt has 18 (13.6%) zeros | Zeros |
Reproduction
| Analysis started | 2020-12-31 01:51:49.509714 |
|---|---|
| Analysis finished | 2020-12-31 01:52:09.295094 |
| Duration | 19.79 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
vendor_name
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 3.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 KiB |
| 1 | |
|---|---|
| CMT | |
| VTS | |
| 2 | |
| DDS | 1 |
| Value | Count | Frequency (%) | |
| 1 | 53 | 40.2% | |
| CMT | 39 | 29.5% | |
| VTS | 32 | 24.2% | |
| 2 | 7 | 5.3% | |
| DDS | 1 | 0.8% |
Frequencies of value counts
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | 0.8% |
Histogram of lengths of the category
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 2.090909091 |
| Min length | 1 |
| Distinct | 132 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 KiB |
| 2012-09-01 05:35:00 | 1 |
|---|---|
| 2013-03-01 00:00:04 | 1 |
| 2017-04-01 00:00:00 | 1 |
| 2017-02-03 02:03:50 | 1 |
| 2014-08-16 14:58:49 | 1 |
| Other values (127) |
| Value | Count | Frequency (%) | |
| 2012-09-01 05:35:00 | 1 | 0.8% | |
| 2013-03-01 00:00:04 | 1 | 0.8% | |
| 2017-04-01 00:00:00 | 1 | 0.8% | |
| 2017-02-03 02:03:50 | 1 | 0.8% | |
| 2014-08-16 14:58:49 | 1 | 0.8% | |
| 2009-04-08 12:19:00 | 1 | 0.8% | |
| 2019-03-01 00:24:41 | 1 | 0.8% | |
| 2018-10-01 00:23:34 | 1 | 0.8% | |
| 2014-01-09 20:45:25 | 1 | 0.8% | |
| 2013-08-26 15:33:22 | 1 | 0.8% | |
| Other values (122) | 122 | 92.4% |
Frequencies of value counts
Unique
| Unique | 132 ? |
|---|---|
| Unique (%) | 100.0% |
Histogram of lengths of the category
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
| Distinct | 132 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 KiB |
| 2017-03-09 21:44:20 | 1 |
|---|---|
| 2009-10-26 13:17:00 | 1 |
| 2019-10-01 00:55:17 | 1 |
| 2019-09-01 00:25:46 | 1 |
| 2019-05-01 00:37:27 | 1 |
| Other values (127) |
| Value | Count | Frequency (%) | |
| 2017-03-09 21:44:20 | 1 | 0.8% | |
| 2009-10-26 13:17:00 | 1 | 0.8% | |
| 2019-10-01 00:55:17 | 1 | 0.8% | |
| 2019-09-01 00:25:46 | 1 | 0.8% | |
| 2019-05-01 00:37:27 | 1 | 0.8% | |
| 2017-09-01 00:18:49 | 1 | 0.8% | |
| 2013-05-01 00:12:00 | 1 | 0.8% | |
| 2019-09-01 00:57:54 | 1 | 0.8% | |
| 2018-01-01 00:24:23 | 1 | 0.8% | |
| 2011-04-29 05:55:00 | 1 | 0.8% | |
| Other values (122) | 122 | 92.4% |
Frequencies of value counts
Unique
| Unique | 132 ? |
|---|---|
| Unique (%) | 100.0% |
Histogram of lengths of the category
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
Passenger_Count
Real number (ℝ≥0)
| Distinct | 6 |
|---|---|
| Distinct (%) | 4.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.318181818 |
|---|---|
| Minimum | 1 |
| Maximum | 6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 3.45 |
| Maximum | 6 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.9596292368 |
|---|---|
| Coefficient of variation (CV) | 0.7279945934 |
| Kurtosis | 12.33705966 |
| Mean | 1.318181818 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.531159149 |
| Sum | 174 |
| Variance | 0.920888272 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) | |
| 1 | 113 | 85.6% | |
| 2 | 10 | 7.6% | |
| 5 | 3 | 2.3% | |
| 6 | 2 | 1.5% | |
| 4 | 2 | 1.5% | |
| 3 | 2 | 1.5% |
| Value | Count | Frequency (%) | |
| 1 | 113 | 85.6% | |
| 2 | 10 | 7.6% | |
| 3 | 2 | 1.5% | |
| 4 | 2 | 1.5% | |
| 5 | 3 | 2.3% |
| Value | Count | Frequency (%) | |
| 6 | 2 | 1.5% | |
| 5 | 3 | 2.3% | |
| 4 | 2 | 1.5% | |
| 3 | 2 | 1.5% | |
| 2 | 10 | 7.6% |
| Distinct | 78 |
|---|---|
| Distinct (%) | 59.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.239469697 |
|---|---|
| Minimum | 0 |
| Maximum | 20 |
| Zeros | 22 |
| Zeros (%) | 16.7% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.4975 |
| median | 1.42 |
| Q3 | 2.7575 |
| 95-th percentile | 5.87 |
| Maximum | 20 |
| Range | 20 |
| Interquartile range (IQR) | 2.26 |
Descriptive statistics
| Standard deviation | 2.988884577 |
|---|---|
| Coefficient of variation (CV) | 1.334639438 |
| Kurtosis | 15.41844051 |
| Mean | 2.239469697 |
| Median Absolute Deviation (MAD) | 1.1 |
| Skewness | 3.447411751 |
| Sum | 295.61 |
| Variance | 8.933431014 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 22 | 16.7% | |
| 1.2 | 6 | 4.5% | |
| 2 | 4 | 3.0% | |
| 0.4 | 4 | 3.0% | |
| 0.5 | 3 | 2.3% | |
| 1.1 | 3 | 2.3% | |
| 2.5 | 3 | 2.3% | |
| 1.5 | 3 | 2.3% | |
| 2.6 | 3 | 2.3% | |
| 0.8 | 3 | 2.3% | |
| Other values (68) | 78 | 59.1% |
| Value | Count | Frequency (%) | |
| 0 | 22 | 16.7% | |
| 0.11 | 1 | 0.8% | |
| 0.17 | 1 | 0.8% | |
| 0.3 | 2 | 1.5% | |
| 0.39 | 1 | 0.8% |
| Value | Count | Frequency (%) | |
| 20 | 1 | 0.8% | |
| 17.52 | 1 | 0.8% | |
| 14.3 | 1 | 0.8% | |
| 10.23 | 1 | 0.8% | |
| 9.8 | 1 | 0.8% |
| Distinct | 72 |
|---|---|
| Distinct (%) | 54.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -38.77506682 |
|---|---|
| Minimum | -74.01405 |
| Maximum | 1 |
| Zeros | 2 |
| Zeros (%) | 1.5% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | -74.01405 |
|---|---|
| 5-th percentile | -74.00329105 |
| Q1 | -73.9818405 |
| median | -73.943541 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 1 |
| Range | 75.01405 |
| Interquartile range (IQR) | 74.9818405 |
Descriptive statistics
| Standard deviation | 37.5456034 |
|---|---|
| Coefficient of variation (CV) | -0.9682924228 |
| Kurtosis | -2.015634718 |
| Mean | -38.77506682 |
| Median Absolute Deviation (MAD) | 0.0677955 |
| Skewness | 0.1228656569 |
| Sum | -5118.308821 |
| Variance | 1409.672334 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1 | 60 | 45.5% | |
| 0 | 2 | 1.5% | |
| -73.977753 | 1 | 0.8% | |
| -73.99635 | 1 | 0.8% | |
| -73.999939 | 1 | 0.8% | |
| -73.995642 | 1 | 0.8% | |
| -73.977357 | 1 | 0.8% | |
| -73.971326 | 1 | 0.8% | |
| -73.98718 | 1 | 0.8% | |
| -73.992438 | 1 | 0.8% | |
| Other values (62) | 62 | 47.0% |
| Value | Count | Frequency (%) | |
| -74.01405 | 1 | 0.8% | |
| -74.013227 | 1 | 0.8% | |
| -74.009446 | 1 | 0.8% | |
| -74.007549 | 1 | 0.8% | |
| -74.005867 | 1 | 0.8% |
| Value | Count | Frequency (%) | |
| 1 | 60 | 45.5% | |
| 0 | 2 | 1.5% | |
| -73.7767 | 1 | 0.8% | |
| -73.787442 | 1 | 0.8% | |
| -73.934798 | 1 | 0.8% |
| Distinct | 72 |
|---|---|
| Distinct (%) | 54.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 KiB |
| N | |
|---|---|
| 0 | 2 |
| 40.779775999999998 | 1 |
| 40.763660999999999 | 1 |
| 40.750458000000002 | 1 |
| Other values (67) |
| Value | Count | Frequency (%) | |
| N | 60 | 45.5% | |
| 0 | 2 | 1.5% | |
| 40.779775999999998 | 1 | 0.8% | |
| 40.763660999999999 | 1 | 0.8% | |
| 40.750458000000002 | 1 | 0.8% | |
| 40.711706999999997 | 1 | 0.8% | |
| 40.729084 | 1 | 0.8% | |
| 40.737569999999998 | 1 | 0.8% | |
| 40.757976999999997 | 1 | 0.8% | |
| 40.738596999999999 | 1 | 0.8% | |
| Other values (62) | 62 | 47.0% |
Frequencies of value counts
Unique
| Unique | 70 ? |
|---|---|
| Unique (%) | 53.0% |
Histogram of lengths of the category
Length
| Max length | 18 |
|---|---|
| Median length | 9 |
| Mean length | 9.310606061 |
| Min length | 1 |
| Distinct | 30 |
|---|---|
| Distinct (%) | 25.0% |
| Missing | 12 |
| Missing (%) | 9.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 80.43333333 |
|---|---|
| Minimum | 1 |
| Maximum | 264 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 21.5 |
| Q3 | 145 |
| 95-th percentile | 239 |
| Maximum | 264 |
| Range | 263 |
| Interquartile range (IQR) | 144 |
Descriptive statistics
| Standard deviation | 90.39267589 |
|---|---|
| Coefficient of variation (CV) | 1.123821084 |
| Kurtosis | -1.169811989 |
| Mean | 80.43333333 |
| Median Absolute Deviation (MAD) | 20.5 |
| Skewness | 0.5796280224 |
| Sum | 9652 |
| Variance | 8170.835854 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=30)
| Value | Count | Frequency (%) | |
| 1 | 59 | 44.7% | |
| 145 | 17 | 12.9% | |
| 161 | 4 | 3.0% | |
| 239 | 3 | 2.3% | |
| 230 | 3 | 2.3% | |
| 151 | 2 | 1.5% | |
| 193 | 2 | 1.5% | |
| 95 | 2 | 1.5% | |
| 48 | 2 | 1.5% | |
| 238 | 2 | 1.5% | |
| Other values (20) | 24 | 18.2% | |
| (Missing) | 12 | 9.1% |
| Value | Count | Frequency (%) | |
| 1 | 59 | 44.7% | |
| 2 | 1 | 0.8% | |
| 41 | 2 | 1.5% | |
| 45 | 1 | 0.8% | |
| 48 | 2 | 1.5% |
| Value | Count | Frequency (%) | |
| 264 | 1 | 0.8% | |
| 263 | 1 | 0.8% | |
| 262 | 1 | 0.8% | |
| 249 | 1 | 0.8% | |
| 239 | 3 | 2.3% |
| Distinct | 37 |
|---|---|
| Distinct (%) | 37.4% |
| Missing | 33 |
| Missing (%) | 25.0% |
| Memory size | 1.0 KiB |
| N | |
|---|---|
| 145 | |
| 0 | |
| 161 | 3 |
| 239 | 3 |
| Other values (32) |
| Value | Count | Frequency (%) | |
| N | 32 | 24.2% | |
| 145 | 16 | 12.1% | |
| 0 | 6 | 4.5% | |
| 161 | 3 | 2.3% | |
| 239 | 3 | 2.3% | |
| 7 | 2 | 1.5% | |
| 24 | 2 | 1.5% | |
| 193 | 2 | 1.5% | |
| 246 | 2 | 1.5% | |
| 234 | 2 | 1.5% | |
| Other values (27) | 29 | 22.0% | |
| (Missing) | 33 | 25.0% |
Frequencies of value counts
Unique
| Unique | 25 ? |
|---|---|
| Unique (%) | 25.3% |
Histogram of lengths of the category
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 2.303030303 |
| Min length | 1 |
| Distinct | 74 |
|---|---|
| Distinct (%) | 56.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -38.51594494 |
|---|---|
| Minimum | -74.015534 |
| Maximum | 3 |
| Zeros | 2 |
| Zeros (%) | 1.5% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | -74.015534 |
|---|---|
| 5-th percentile | -73.9958961 |
| Q1 | -73.9808765 |
| median | -73.93310747 |
| Q3 | 1.25 |
| 95-th percentile | 2 |
| Maximum | 3 |
| Range | 77.015534 |
| Interquartile range (IQR) | 75.2308765 |
Descriptive statistics
| Standard deviation | 37.82084394 |
|---|---|
| Coefficient of variation (CV) | -0.981952903 |
| Kurtosis | -2.015122876 |
| Mean | -38.51594494 |
| Median Absolute Deviation (MAD) | 0.07363402581 |
| Skewness | 0.123182447 |
| Sum | -5084.104732 |
| Variance | 1430.416237 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2 | 32 | 24.2% | |
| 1 | 27 | 20.5% | |
| 0 | 2 | 1.5% | |
| -73.976247 | 1 | 0.8% | |
| -73.986397 | 1 | 0.8% | |
| -73.9946 | 1 | 0.8% | |
| -73.983492 | 1 | 0.8% | |
| -73.973602 | 1 | 0.8% | |
| -73.957112 | 1 | 0.8% | |
| -73.976193 | 1 | 0.8% | |
| Other values (64) | 64 | 48.5% |
| Value | Count | Frequency (%) | |
| -74.015534 | 1 | 0.8% | |
| -74.010464 | 1 | 0.8% | |
| -74.00999 | 1 | 0.8% | |
| -74.003493 | 1 | 0.8% | |
| -73.999419 | 1 | 0.8% |
| Value | Count | Frequency (%) | |
| 3 | 1 | 0.8% | |
| 2 | 32 | 24.2% | |
| 1 | 27 | 20.5% | |
| 0 | 2 | 1.5% | |
| -73.776308 | 1 | 0.8% |
| Distinct | 100 |
|---|---|
| Distinct (%) | 75.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25.50447503 |
|---|---|
| Minimum | 0 |
| Maximum | 40.811692 |
| Zeros | 2 |
| Zeros (%) | 1.5% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2.5 |
| Q1 | 6.75 |
| median | 40.6932995 |
| Q3 | 40.7572485 |
| 95-th percentile | 40.78042395 |
| Maximum | 40.811692 |
| Range | 40.811692 |
| Interquartile range (IQR) | 34.0072485 |
Descriptive statistics
| Standard deviation | 16.90725235 |
|---|---|
| Coefficient of variation (CV) | 0.6629131683 |
| Kurtosis | -1.771954185 |
| Mean | 25.50447503 |
| Median Absolute Deviation (MAD) | 0.0963895 |
| Skewness | -0.3110408982 |
| Sum | 3366.590704 |
| Variance | 285.855182 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2.5 | 14 | 10.6% | |
| 3 | 5 | 3.8% | |
| 3.5 | 4 | 3.0% | |
| 14 | 3 | 2.3% | |
| 4.5 | 3 | 2.3% | |
| 0 | 2 | 1.5% | |
| 5.5 | 2 | 1.5% | |
| 13 | 2 | 1.5% | |
| 4 | 2 | 1.5% | |
| 7.5 | 2 | 1.5% | |
| Other values (90) | 93 | 70.5% |
| Value | Count | Frequency (%) | |
| 0 | 2 | 1.5% | |
| 2.5 | 14 | 10.6% | |
| 3 | 5 | 3.8% | |
| 3.5 | 4 | 3.0% | |
| 4 | 2 | 1.5% |
| Value | Count | Frequency (%) | |
| 40.811692 | 1 | 0.8% | |
| 40.8084 | 1 | 0.8% | |
| 40.792058 | 1 | 0.8% | |
| 40.78732 | 1 | 0.8% | |
| 40.785647 | 1 | 0.8% |
Payment_Type
Categorical
| Distinct | 12 |
|---|---|
| Distinct (%) | 9.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 KiB |
| 0.5 | |
|---|---|
| CRD | |
| CSH | |
| 3 | |
| CASH | |
| Other values (7) |
| Value | Count | Frequency (%) | |
| 0.5 | 41 | 31.1% | |
| CRD | 27 | 20.5% | |
| CSH | 26 | 19.7% | |
| 3 | 15 | 11.4% | |
| CASH | 7 | 5.3% | |
| Credit | 4 | 3.0% | |
| Cas | 3 | 2.3% | |
| CAS | 3 | 2.3% | |
| 1 | 2 | 1.5% | |
| 0 | 2 | 1.5% | |
| Other values (2) | 2 | 1.5% |
Frequencies of value counts
Unique
| Unique | 2 ? |
|---|---|
| Unique (%) | 1.5% |
Histogram of lengths of the category
Length
| Max length | 6 |
|---|---|
| Median length | 3 |
| Mean length | 2.863636364 |
| Min length | 1 |
| Distinct | 39 |
|---|---|
| Distinct (%) | 29.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.872727273 |
|---|---|
| Minimum | 0.5 |
| Maximum | 45 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | 0.5 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 0.5 |
| median | 4.1 |
| Q3 | 8.2 |
| 95-th percentile | 16.9 |
| Maximum | 45 |
| Range | 44.5 |
| Interquartile range (IQR) | 7.7 |
Descriptive statistics
| Standard deviation | 7.910203006 |
|---|---|
| Coefficient of variation (CV) | 1.346938592 |
| Kurtosis | 10.16702135 |
| Mean | 5.872727273 |
| Median Absolute Deviation (MAD) | 3.6 |
| Skewness | 2.776624281 |
| Sum | 775.2 |
| Variance | 62.57131159 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=39)
| Value | Count | Frequency (%) | |
| 0.5 | 60 | 45.5% | |
| 6.9 | 6 | 4.5% | |
| 9.3 | 4 | 3.0% | |
| 4.5 | 4 | 3.0% | |
| 12.1 | 4 | 3.0% | |
| 6.5 | 4 | 3.0% | |
| 6 | 3 | 2.3% | |
| 4.1 | 3 | 2.3% | |
| 4.9 | 3 | 2.3% | |
| 45 | 2 | 1.5% | |
| Other values (29) | 39 | 29.5% |
| Value | Count | Frequency (%) | |
| 0.5 | 60 | 45.5% | |
| 2.5 | 2 | 1.5% | |
| 2.9 | 1 | 0.8% | |
| 3.3 | 1 | 0.8% | |
| 3.7 | 1 | 0.8% |
| Value | Count | Frequency (%) | |
| 45 | 2 | 1.5% | |
| 39.5 | 1 | 0.8% | |
| 26.5 | 2 | 1.5% | |
| 18 | 2 | 1.5% | |
| 16 | 1 | 0.8% |
| Distinct | 19 |
|---|---|
| Distinct (%) | 14.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5828030303 |
|---|---|
| Minimum | 0 |
| Maximum | 6.3 |
| Zeros | 70 |
| Zeros (%) | 53.0% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.5 |
| 95-th percentile | 2.4225 |
| Maximum | 6.3 |
| Range | 6.3 |
| Interquartile range (IQR) | 0.5 |
Descriptive statistics
| Standard deviation | 0.9905032373 |
|---|---|
| Coefficient of variation (CV) | 1.699550596 |
| Kurtosis | 10.13071067 |
| Mean | 0.5828030303 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.807869629 |
| Sum | 76.93 |
| Variance | 0.9810966632 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=19)
| Value | Count | Frequency (%) | |
| 0 | 70 | 53.0% | |
| 0.5 | 30 | 22.7% | |
| 1 | 10 | 7.6% | |
| 2 | 5 | 3.8% | |
| 1.5 | 2 | 1.5% | |
| 4 | 2 | 1.5% | |
| 3.06 | 1 | 0.8% | |
| 2.75 | 1 | 0.8% | |
| 1.47 | 1 | 0.8% | |
| 2.4 | 1 | 0.8% | |
| Other values (9) | 9 | 6.8% |
| Value | Count | Frequency (%) | |
| 0 | 70 | 53.0% | |
| 0.5 | 30 | 22.7% | |
| 0.7 | 1 | 0.8% | |
| 1 | 10 | 7.6% | |
| 1.14 | 1 | 0.8% |
| Value | Count | Frequency (%) | |
| 6.3 | 1 | 0.8% | |
| 4 | 2 | 1.5% | |
| 3.95 | 1 | 0.8% | |
| 3.06 | 1 | 0.8% | |
| 2.75 | 1 | 0.8% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 10 |
| Missing (%) | 7.6% |
| Memory size | 1.0 KiB |
| 0.5 | |
|---|---|
| 0 | |
| (Missing) |
| Value | Count | Frequency (%) | |
| 0.5 | 62 | 47.0% | |
| 0 | 60 | 45.5% | |
| (Missing) | 10 | 7.6% |
| Distinct | 21 |
|---|---|
| Distinct (%) | 15.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.7362878788 |
|---|---|
| Minimum | 0 |
| Maximum | 10.1 |
| Zeros | 41 |
| Zeros (%) | 31.1% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.3 |
| Q3 | 0.3 |
| 95-th percentile | 2.725 |
| Maximum | 10.1 |
| Range | 10.1 |
| Interquartile range (IQR) | 0.3 |
Descriptive statistics
| Standard deviation | 1.466532705 |
|---|---|
| Coefficient of variation (CV) | 1.991792542 |
| Kurtosis | 21.02805066 |
| Mean | 0.7362878788 |
| Median Absolute Deviation (MAD) | 0.3 |
| Skewness | 4.173127027 |
| Sum | 97.19 |
| Variance | 2.150718176 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=21)
| Value | Count | Frequency (%) | |
| 0.3 | 60 | 45.5% | |
| 0 | 41 | 31.1% | |
| 1 | 7 | 5.3% | |
| 2 | 3 | 2.3% | |
| 2.12 | 2 | 1.5% | |
| 3 | 2 | 1.5% | |
| 2.4 | 2 | 1.5% | |
| 2.5 | 2 | 1.5% | |
| 1.3 | 1 | 0.8% | |
| 9 | 1 | 0.8% | |
| Other values (11) | 11 | 8.3% |
| Value | Count | Frequency (%) | |
| 0 | 41 | 31.1% | |
| 0.3 | 60 | 45.5% | |
| 1 | 7 | 5.3% | |
| 1.18 | 1 | 0.8% | |
| 1.3 | 1 | 0.8% |
| Value | Count | Frequency (%) | |
| 10.1 | 1 | 0.8% | |
| 9 | 1 | 0.8% | |
| 6.75 | 1 | 0.8% | |
| 3.8 | 1 | 0.8% | |
| 3.7 | 1 | 0.8% |
| Distinct | 40 |
|---|---|
| Distinct (%) | 30.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.31 |
|---|---|
| Minimum | 0 |
| Maximum | 33.3 |
| Zeros | 69 |
| Zeros (%) | 52.3% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 8.1875 |
| 95-th percentile | 22.475 |
| Maximum | 33.3 |
| Range | 33.3 |
| Interquartile range (IQR) | 8.1875 |
Descriptive statistics
| Standard deviation | 7.813808388 |
|---|---|
| Coefficient of variation (CV) | 1.471527003 |
| Kurtosis | 2.020590915 |
| Mean | 5.31 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.640432595 |
| Sum | 700.92 |
| Variance | 61.05560153 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=40)
| Value | Count | Frequency (%) | |
| 0 | 69 | 52.3% | |
| 3.8 | 12 | 9.1% | |
| 4.3 | 5 | 3.8% | |
| 4.8 | 3 | 2.3% | |
| 8.3 | 2 | 1.5% | |
| 12.3 | 2 | 1.5% | |
| 4.57 | 2 | 1.5% | |
| 16.3 | 2 | 1.5% | |
| 20.3 | 2 | 1.5% | |
| 8.8 | 2 | 1.5% | |
| Other values (30) | 31 | 23.5% |
| Value | Count | Frequency (%) | |
| 0 | 69 | 52.3% | |
| 3.8 | 12 | 9.1% | |
| 4.15 | 1 | 0.8% | |
| 4.3 | 5 | 3.8% | |
| 4.57 | 2 | 1.5% |
| Value | Count | Frequency (%) | |
| 33.3 | 1 | 0.8% | |
| 31.6 | 1 | 0.8% | |
| 27.8 | 1 | 0.8% | |
| 26.3 | 1 | 0.8% | |
| 25.3 | 1 | 0.8% |
| Distinct | 62 |
|---|---|
| Distinct (%) | 58.5% |
| Missing | 26 |
| Missing (%) | 19.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.784716981 |
|---|---|
| Minimum | 0 |
| Maximum | 58.15 |
| Zeros | 18 |
| Zeros (%) | 13.6% |
| Memory size | 1.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2.5 |
| median | 7 |
| Q3 | 11.595 |
| 95-th percentile | 22.65 |
| Maximum | 58.15 |
| Range | 58.15 |
| Interquartile range (IQR) | 9.095 |
Descriptive statistics
| Standard deviation | 10.0105085 |
|---|---|
| Coefficient of variation (CV) | 1.139536825 |
| Kurtosis | 9.762040402 |
| Mean | 8.784716981 |
| Median Absolute Deviation (MAD) | 4.5 |
| Skewness | 2.752763239 |
| Sum | 931.18 |
| Variance | 100.2102804 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 18 | 13.6% | |
| 2.5 | 16 | 12.1% | |
| 7 | 3 | 2.3% | |
| 6.9 | 2 | 1.5% | |
| 10.62 | 2 | 1.5% | |
| 8.1 | 2 | 1.5% | |
| 8.3 | 2 | 1.5% | |
| 8 | 2 | 1.5% | |
| 15 | 2 | 1.5% | |
| 9.8 | 2 | 1.5% | |
| Other values (52) | 55 | 41.7% | |
| (Missing) | 26 | 19.7% |
| Value | Count | Frequency (%) | |
| 0 | 18 | 13.6% | |
| 2.5 | 16 | 12.1% | |
| 3 | 1 | 0.8% | |
| 3.4 | 1 | 0.8% | |
| 3.5 | 1 | 0.8% |
| Value | Count | Frequency (%) | |
| 58.15 | 1 | 0.8% | |
| 50.6 | 1 | 0.8% | |
| 50.07 | 1 | 0.8% | |
| 33.75 | 1 | 0.8% | |
| 31.07 | 1 | 0.8% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| vendor_name | Trip_Pickup_DateTime | Trip_Dropoff_DateTime | Passenger_Count | Trip_Distance | Start_Lon | Start_Lat | Rate_Code | store_and_forward | End_Lon | End_Lat | Payment_Type | Fare_Amt | surcharge | mta_tax | Tip_Amt | Tolls_Amt | Total_Amt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | VTS | 2009-01-04 02:52:00 | 2009-01-04 03:02:00 | 1 | 2.63 | -73.991957 | 40.721567 | NaN | NaN | -73.993803 | 40.695922 | CASH | 8.9 | 0.5 | NaN | 0.0 | 0.00 | 9.40 |
| 1 | DDS | 2009-02-03 08:25:00 | 2009-02-03 08:33:39 | 1 | 1.60 | -73.992768 | 40.758324999999999 | NaN | NaN | -73.994710 | 40.739723 | CASH | 6.9 | 0.0 | NaN | 0.0 | 0.00 | 6.90 |
| 2 | CMT | 2009-03-26 15:30:14 | 2009-03-26 15:33:45 | 1 | 0.30 | -73.970709 | 40.796382000000001 | NaN | 0 | -73.973602 | 40.792058 | Cash | 4.1 | 0.0 | NaN | 0.0 | 0.00 | 4.10 |
| 3 | VTS | 2009-04-08 12:19:00 | 2009-04-08 12:24:00 | 1 | 0.49 | -73.974467 | 40.760793 | NaN | NaN | -73.966770 | 40.757057 | CASH | 4.1 | 0.0 | NaN | 0.0 | 0.00 | 4.10 |
| 4 | CMT | 2009-05-27 07:41:05 | 2009-05-27 07:42:28 | 1 | 0.30 | -73.974105 | 40.742891999999998 | NaN | 0 | -73.973769 | 40.746405 | Credit | 3.3 | 0.0 | NaN | 1.0 | 0.00 | 4.30 |
| 5 | VTS | 2009-06-14 23:23:00 | 2009-06-14 23:48:00 | 1 | 17.52 | -73.787442 | 40.641525000000001 | NaN | NaN | -73.980072 | 40.742963 | Credit | 45.0 | 0.0 | NaN | 9.0 | 4.15 | 58.15 |
| 6 | VTS | 2009-07-15 17:39:00 | 2009-07-15 17:46:00 | 1 | 1.32 | -73.999132 | 40.726542000000002 | NaN | NaN | -73.984907 | 40.736347 | Credit | 6.1 | 1.0 | NaN | 1.0 | 0.00 | 8.10 |
| 7 | VTS | 2009-08-12 07:28:00 | 2009-08-12 07:36:00 | 1 | 1.80 | 0.000000 | 0 | NaN | NaN | 0.000000 | 0.000000 | CASH | 6.9 | 0.0 | NaN | 0.0 | 0.00 | 6.90 |
| 8 | VTS | 2009-09-24 09:00:00 | 2009-09-24 09:29:00 | 1 | 10.23 | -73.978968 | 40.766173000000002 | NaN | NaN | -73.872268 | 40.774530 | CASH | 26.5 | 0.0 | NaN | 0.0 | 4.57 | 31.07 |
| 9 | VTS | 2009-10-26 13:06:00 | 2009-10-26 13:17:00 | 5 | 2.02 | -73.967242 | 40.803224999999998 | NaN | NaN | -73.957517 | 40.783533 | CASH | 8.1 | 0.0 | NaN | 0.0 | 0.00 | 8.10 |
Last rows
| vendor_name | Trip_Pickup_DateTime | Trip_Dropoff_DateTime | Passenger_Count | Trip_Distance | Start_Lon | Start_Lat | Rate_Code | store_and_forward | End_Lon | End_Lat | Payment_Type | Fare_Amt | surcharge | mta_tax | Tip_Amt | Tolls_Amt | Total_Amt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 122 | 1 | 2020-02-01 00:17:35 | 2020-02-01 00:30:32 | 1 | 2.6 | 1.0 | N | 145.0 | 7 | 1.0 | 11.0 | 0.5 | 0.5 | 2.45 | 0.0 | 0.3 | 14.75 | 0.0 |
| 123 | 1 | 2020-02-01 00:32:47 | 2020-02-01 01:05:36 | 1 | 4.8 | 1.0 | N | 45.0 | 61 | 1.0 | 21.5 | 3 | 0.5 | 6.30 | 0.0 | 0.3 | 31.60 | 2.5 |
| 124 | 1 | 2020-03-01 00:31:13 | 2020-03-01 01:01:42 | 1 | 4.7 | 1.0 | N | 88.0 | 255 | 1.0 | 22.0 | 3 | 0.5 | 2.00 | 0.0 | 0.3 | 27.80 | 2.5 |
| 125 | 2 | 2020-03-01 00:08:22 | 2020-03-01 00:08:49 | 1 | 0.0 | 1.0 | N | 193.0 | 193 | 2.0 | 2.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 3.80 | 0.0 |
| 126 | 1 | 2020-04-01 00:41:22 | 2020-04-01 01:01:53 | 1 | 1.2 | 1.0 | N | 41.0 | 24 | 2.0 | 5.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 6.80 | 0.0 |
| 127 | 1 | 2020-04-01 00:56:00 | 2020-04-01 01:09:25 | 1 | 3.4 | 1.0 | N | 95.0 | 197 | 1.0 | 12.5 | 0.5 | 0.5 | 2.75 | 0.0 | 0.3 | 16.55 | 0.0 |
| 128 | 1 | 2020-05-01 00:02:28 | 2020-05-01 00:18:07 | 1 | 0.0 | 1.0 | N | 234.0 | 256 | 1.0 | 12.2 | 3 | 0.5 | 2.40 | 0.0 | 0.3 | 18.40 | 2.5 |
| 129 | 1 | 2020-05-01 00:23:21 | 2020-05-01 00:26:01 | 2 | 0.4 | 1.0 | N | 264.0 | 264 | 1.0 | 4.0 | 0.5 | 0.5 | 0.50 | 0.0 | 0.3 | 5.80 | 0.0 |
| 130 | 1 | 2020-06-01 00:31:23 | 2020-06-01 00:49:58 | 1 | 3.6 | 1.0 | N | 140.0 | 68 | 1.0 | 15.5 | 3 | 0.5 | 4.00 | 0.0 | 0.3 | 23.30 | 2.5 |
| 131 | 1 | 2020-06-01 00:42:50 | 2020-06-01 01:04:33 | 1 | 5.6 | 1.0 | N | 79.0 | 226 | 1.0 | 19.5 | 3 | 0.5 | 2.00 | 0.0 | 0.3 | 25.30 | 2.5 |